The Bell System Technical Journal 

July, 1926 
The Power of Fundamental Speech Sounds 

By C. F. SACIA and C. J. BECK 

Synopsis: This paper describes the continuing work on speech power by 
means of oscillographic studies of vowels, semi-vowels and consonants. 
A previous paper considered the characteristics of a few individual sounds 
from the power standpoint, but the principal emphasis was placed upon 
speech as a whole. In this later analysis, sounds are considered individually 
on the basis of instantaneous and mean power. A practical application 
of the results is suggested. 

CONTINUING the work done on speech power by means of 
power oscillograms, 1 we have made additional reductions in the 
data relative to the vowels, semi-vowels and consonants and have also 
prepared a smaller amount of data on the power of the semi-vowels 
and the consonants from the amplitude oscillograms. 2 This is a pre- 
liminary study of the subject, at least in so far as the latter two classes 
of sounds are concerned, for these records of speech sounds were made 
to show all sounds in their true relative value hence the consonant 
sounds, being greatly inferior to the vowels were measurable to a 
correspondingly smaller degree of accuracy. We have gathered such 
data as the existing records could yield before future plans are com- 
pleted to make a more comprehensive study of consonants. 

Stop consonants are not so well characterized by the power data as 
are other types. The unvoiced stop consonants have two properties: 
a puff whose main frequency component is of the order of 50 cycles 
with a few ripples of high frequency; and a modifying effect upon 
the beginning or end of the vowel which immediately precedes or suc- 
ceeds it. Hence, such a consonant is more of a controlling factor and 
lacks the essential properties of a discrete sound. In giving the data 
on the puff where it is measurable, we separate the low and high fre- 
quency components. In the case of the voiced stop consonants the 
vocal cord vibrations give the consonant more character of its own. 

Mean Power and Peak Power 

In the paper on speech power and energy, the "mean power," P m , 
was derived (in the case of the vowel sounds) as the mean of the power 
taken throughout the interval of the vocal cycle. By the assumption 
of an appropriate arbitrary interval instead, say of the order of one 

1 B. S. T. J. Vol. IV No. 4. "Speech Power and Energy," by C. F. Sacia. 

2 B. S. T. J. Vol. IV No. 4. "Sounds of Speech," by 1. B. Crandall. 
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one-hundredth of a second, the definition applies as well to consonant 
sounds and in addition has the same practical significance as that of 
the mean power of a vowel. 

Mean power is thus a variable function of time, starting from zero, 
rising to a maximum and eventually falling to zero again as the sound 
is being uttered. 3 In studying an aggregate of speech sounds it is im- 
practicable to have the final results in terms of these mean power 
curves; the most important discriminant of such a curve of any sound 
is its maximum ordinate, P,„. This value was used in the earlier study 
and has been given the name "syllabic power" when used in connection 
with the syllable as a whole. In the present case we shall abbreviate 
by simply calling it the "mean power of the sound." Similarly, when 
we are considering the consonant apart from the rest of the syllable we 
select the maximum value of P,„ for that consonant. 

Likewise, in considering the instantaneous power of a sound we 
select the height of the greatest peak occurring therein and for con- 
venience we call it the "peak power." 

All the averages hereinafter tabulated are the arithmetic averages 
of such maximum ordinates and not the integrated averages. 

Normal and Conversational Values 

We specify "normal" values as those derived from monosyllables 
spoken disconnectedly without accent but also without being slighted; 
while "conversational" values are derived from ordinary conversa- 
tional speech. It does not follow that the arithmetic average of con- 
versational values for a given sound should equal the average of the 
normal value, for the reason that some sounds are slighted much more 
frequently than others, as we shall see later. 

The Consonants and Semi-Vowels 

Of these sounds two independent sets of data are available: in- 
stantaneous peak power and mean power. The former is summarized 
in Table I. To explain the table in detail we take as an example the 
consonant, "t" as in "tap." There being one observation upon each 
of two speakers, the greatest observation showed 19 microwatts (peak) 
from the lips of the one speaker while the other speaker reached a peak 
of 13 microwatts, and the average of these two is 16. As in the paper 
on Speech Power and Energy, the corresponding values of power in- 
tensity in microwatts per square centimeter at the condenser trans- 
mitter are given in the group at the right. Since the relating factor is 

3 See "Speech Power and Energy," Fig. 1, page 628, for comparison of instantane- 
ous and mean powers. 
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TABLE I 

Normal Values of Peak Power in Microwatts for Two Speakers 
(A) Consonants 



Consonant Total from Voice 


Per Cm 2 at Trans. 


Symbol 


Key 


Max. 


Min. 


Ave. 


Max. 


Min. 


Ave. 


1) 
P 
*P 
d 
t 

g 
k 

dh 

th 

*th 

V 

*r 
f 
j 

ch 
zh 

sh 
z 

s 


bat 

pot 

pot 

dot 

tap 

get 

kit 

that 

thin 

thin 

vat 

for 

for 

jot 

chat 

azure 

shot 

zip 

sit 


7 

7 
128 

7 
19 

9 

9 
10 

1 
30 
29 
53 

4 
26 
61 
53 
133 
42 
54 


7 

6 



1 

13 

7 

4 

8 





21 

10 

2 

23 

43 

23 

97 

21 

8 


7 

6 

64 

4 

16 

8 

6 

9 

1 

15 

25 

31 

3 

24 

52 

38 

115 

31 

31 


0.06 
0.06 
1.04 
0.06 
0.15 
0.07 
0.07 
0.08 
0.01 
0.24 
0.23 
0.42 
0.04 
0.21 
0.49 
0.43 
1.08 
0.34 
0.43 


0.05 

0.05 

0. 

0.01 

0.11 

0.06 

0.03 

0.06 

0. 



0.17 

0.08 

0.02 

0.19 

0.35 

0.19 

0.79 

0.17 

0.06 


0.06 
0.05 
0.52 
0.04 
0.13 
0.06 
0.05 
0.07 
0.01 
0.12 
0.20 
0.25 
0.03 
0.20 
0.42 
0.31 
0.93 
0.25 
0.25 



* Low frequency puff. 



(B) Semi-Vowels 



Semi-Vowel 


Total from Voice 


Per Cm 2 at Trans. 


Symbol 


Key 


Max. 


Min. 


Ave. 


Max. 


Min. 


Ave. 


1 
ng 

n 
m 


let 
ring 
no 
me 


226 

169 

74 

198 


37 
25 
21 
23 


131 
97 
47 

111 


1.83 
1.36 
0.59 
1.60 


0.29 
0.20 
0.17 
0.18 


1.06 
0.78 
0.38 
0.89 



Note: For these two speakers, the peak power of the succeeding vowel was as 
follows: 

Total Per Cm 2 

u (tool) 206 1.7 

a (tap) 860 6.8 

e (teem) 241 1.9 

about 127, the intensities 0.15, 0.11 and 0.13 are the first three num- 
bers respectively divided by 127. 

These values were derived by measuring the amplitudes of the above- 
mentioned oscillograms of the acoustic pressure. The maximum or 
peak amplitudes of the consonant and the succeeding vowel were first 
measured; the square of ths ratio between these is the ratio of the 
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corresponding peak powers. Now the approximate peak powers of 
these vowels for the two speakers were found (see note under Table I) 
from the power oscillograms used in our study of speech power. Hence 
from the product we derive the approximate peak power of the con- 
sonant (or semi-vowel). Direct measurement of peak power from the 
latter oscillograms was impracticable because of the low sensitivity of 
the instantaneous power recorder 4 and the before-mentioned fact that 
the power of the consonants and semi-vowels is low relative to that 
of the vowels. 

Since frequencies of the order of 50 cycles are of negligible importance 
in speech, the 50-cycle puff has been separated from the other compon- 
ents in the case of the unvoiced stop consonants. This is justified by 
the fact that the utterances of such a sound by two speakers may seem 
exactly alike to the careful listener, whereas a large puff may be present 
in one case and none in the other. 

The values thus far considered represent "normal" values in speech 
— not accented and yet not slighted. 

TABLE II 

Conversational Values of Mean Power in Microwatts for 16 Speakers 

(A) Consonants 



Consonant 


Speaker's Power 


Number of 


Per Cm 2 


at Trans. 










Measurable 
Obser- 


















Symbol 


Key 


Max. 


Av. 


vations 


Max. 


Av. 


d 


dot 


2.9 


0.08 


4 


0.023 


0.0006 


t 


tap 


6.0 


0.14 


14 


0.049 


0.0012 


k 


kit 


4.8 


0.34 


20 


0.039 


0.0027 


V 


vat 


2.4 


0.03 


1 


0.019 


0.0002 


f 


for 


3.6 


0.08 


1 


0.029 


0.0006 


J 


jot 


3.6 


0.47 


8 


0.029 


0.0038 


ch 


chat 


7.9 


1.44 


19 


0.064 


0.0116 


sh 


shot 


6.0 


1.83 


9 


0.049 


0.0148 


z 


zip 


7.2 


0.72 


31 


0.058 


0.0058 


s 


sit 


8.7 


0.94 


115 


0.070 


0.0076 







(B) Semi- Vowels 






Semi-Vowel 


Speaker's Power 


Number of 
Measurable 
Obser- 
vations 


Per Cm 2 


at Trans. 


Symbol 


Key 


Max. 


Av. 


Max. 


Av. 


1 

ng 

n 

m 


let 

ring 

no 

me 


9.6 
3.6 

18.0 
16.8 


0.33 
0.35 
2.11 
1.85 


13 

2 

146 

31 


0.078 
0.029 
0.145 
0.136 


0.0026 
0.0028 
0.0170 
0.0149 



4 In recording the power, separate vibrators had been used for instantaneous and 
mean powers. 
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Our measurements of mean power, on the other hand, were made 
from power records of conversational speech, with a greater variety 
of observations and speakers. Stress, therefore, plays an important 
part here. 

In Table II is given a compact summary of the direct measurements 
made on the power oscillograms. Thus consider "d" as in "dot." 
2.9 microwatts was the greatest observed value for any speaker, while 
the average of all observations (including accented and unaccented 
utterances) was but 0.08. Only four observations, however, were 
large enough to be measured. As before, we give the corresponding 
intensities in microwatts per square centimeter at the transmitter in 
the next two columns. 

To show the occurrence of stress in the utterance of these sounds in 
ordinary speech, we give in Fig. 1 the stress frequency-distribution 
curves 6 of several oft-occurring sounds. These curves are derived in 
the same manner as were the syllabic stress curves in the study of 
speech power. They exhibit the marked degree in which the conso- 
nants differ in stress for ordinary speech. For example, among the con- 
sonant sounds, "t" and "sh" represent extreme types. The former is 
either slighted or strongly accented with but little intermediate grada- 
tion while the blunt characteristic of the latter indicates the most 
nearly uniform distribution of stress into all shades from zero to maxi- 
mum. Similarly with the three semi-vowels shown, "1" and "m" are 
extreme types. 

The Vowels 

Some attention was given to vowel power in the other paper where 
under the heading of "Relative Power of Vowels" (on page 634) were 
charted what we have classified as normal values of mean power. 
These were derived from the mean power curves of disconnected mono- 
syllables. Although they were charted separately for male and female 
voices, we shall not differentiate between the two in the following. In 
Tables III and IV are summarized the four sets of data based upon the 
speech from 16 voices. Here we see the influence of stress by comparing 
the conversational and normal values. This effect is noteworthy in 
the case of "o" (ton) "a" (tap) and "i" (tip) which average consider- 
ably less power in conversational speech than in normal syllables. 
Another point of interest is the comparison of peak and mean values. 
For example, in the normal data, the ratio of peak to mean (i.e. the 

B The abscissa represents the relative number of observations (s/s) whose relative 
power values exceed the magnitude of the ordinate, n, a numeric varying between 
zero and one. 
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square of the peak factor) is greater for centrally located vowels and is 
greatest for "a" (tap) as was mentioned in the earlier paper. Referring 
to the normal values of peak power we find a surprising degree of 
regularity in the increase of these values from a minimum for "Q" 
(tool) to a maximum for "a" (tap) and the falling off again to minimum 
for "e" (teem). The one slight irregularity is the vowel "o" (ton). 
(We have omitted "r" (err) from this comparison because it has no 
well defined place on the Vietor triangle which forms the basis for this 
arrangement of the other vowels). 

TABLE V— SPEECH SOUNDS 







Relative Power, Arbitrary Units 






A 


B 


C 






Mean Power 


Peak Power 


Relative Power 


Speech 


Key 


Conversational 


Normal values 


Attenuation 


Sound 




values for 16 


for 2 


to give 80% 






speakers 


speakers 


Articulation 


6 


talk 


1870 


688 


826 


a 


top 


1380 


1430 


474 


o 


tone 


875 


630 


619 


a 


tape 


808 


632 


567 


e 


ten 


664 


975 


364 


o 


ton 


616 


688 


474 


u 


tool 


532 


344 


349 


e 


teem 


484 


402 


421 


f 


err 


384 


- see note 


924 


a 


tap 


366 


2170 


645 


i 


tip 


346 


688 


295 


n 


no 


84 


78 


36 


m 


me 


74 


185 


38 


sh 


shot 


73 


192 


216 


ch 


chat 


58 


87 


64 


s 


sit 


38 


51 


11 


z 


zip 


29 


52 


17 


j 


jot 


19 


41 


98 


ng 


ring 


14 


162 


134 


k 


kit 


14 


10 


43 


1 


let 


13 


218 


157 


t 


tap 


6 


26 


32 


cl 


dot 


3 


7 


60 


f 


for 


3 


6 


9 


V 


vat 


1 


41 


13 


u 


took 


- see note 


688 


347 


zh 


azure 


- 


63 


- 


dh 


that 


- 


15 


- 


g 


get 


- 


13 


60 


b 


bat 


— 


11 


30 


P 


pot 


- 


11 


24 


th 


thin 




1 


1 



Note: The dash indicates that observations were not available. 
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Relative Power of Speech Sounds 

A direct comparison of most of the fundamental sounds will now be 
made. In Table V — A are shown the conversational values (averaged) 
of the mean power for each sound for 16 speakers. The units are taken 
arbitrarily in order to show only the relative values. As might have 
been expected , the vowels rank the highest, the semi-vowels next and 
the consonants the lowest, although we find a few consonants inter- 
spersed among the semi-vowels. In Table V — B is the similar arrange- 
ment for the normal values of peak power for the two speakers. Data 
on a larger number of sounds are available for this group, but the same 
general order prevails: vowels, semi-vowels and consonants. Minor 
differences in order (note "v" as in "vat") may be expected to occur 
because of the influence of stress upon the conversational value. But 
in both cases the ratio of the maximum to the minimum is of the order 
of 2000. This similarity is striking in view of the difference in the 
modes of utterance and the numbers of speakers in the two cases. 

Finally, in Table V — C are shown relative values 6 derived on the basis 
of relative attenuation in power required to bring the articulation (as 
judged by the average ear) to 80%. Since disconnected monosyllables 

/ I \ 
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H 
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A 



l / 

r 



> 



ft. 



o_ 
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Fig 2. 






Comparative Chart Relative Normal Values of Vowel Sounds. 

Peak Power. 

Mean Power. 

Relative Power Attenuation Required to Give 80% Articulation. 



were used in this test the values are normal values in our present 
category. Although the same general order of the other two tables 

6 Taken from the paper presented by Harvey Fletcher before the Modern Lan- 
guages Association, December 1923. Values are there called relative "intensity" 
which term we avoid here because of the acoustic meaning already assigned to in- 
tensity: power per square centimeter. 
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prevails here, there are considerable differences throughout which may 
well be expected since the ear is used in making the balance. The 
frequency response characteristic of the ear is the complicating factor 
in this case. The ratio of maximum to minimum here is of the order of 
one thousand or about one-half the absolute power ratio found in the 
two preceding tables. 

A more orderly comparison between power and "relative attenu- 
ation" exists in the case of the vowels alone as shown in the chart of 
Fig. 2. Thus the peak power and "relative attenuation" most nearly 
correspond at the ends of the chart (especially the left) where there is 
resonance of lower frequency in the vowels. The vowel "o" again 
shows a peculiarity in that the two trends — as shown by the envelopes 
— intersect here. Peak power predominates over "relative attenua- 
tion" in the three successive vowels "a," "a," "e," which have strong 
resonance in the region from 600 to 1200 cycles. The vowel "i" gives 
the only erratic turn in this comparison, differing considerably from 
the two adjacent vowels. 

As for loudness in the ordinary sense, let us note a phenomenon of 
rather common occurrence in these days of good quality sound repro- 
ducing apparatus. One may be listening to well reproduced speech at 
ordinary volume when suddenly a slightly accented syllable containing 
"a" (tap) comes through with noticeable overload distortion and its 
accompanying disagreeable effect upon the ear. Although the listener 
does not judge this sound to be any louder than numerous accented 
sounds preceding and following it, still the fact remains that there has 
been considerable overload due to the peaks of the wave being cut off 
by the amplifier. Where do we look for the explanation? As noted 
in the earlier paper this vowel has the highest peak factor, and we have 
already seen in Table III that it normally contains the greatest peak 
power. In spite of this therefore, it would seem that the loudness of 
this sound does not predominate over the loudness of the sounds in the 
first half of the chart, as does the peak power. This phenomenon can 
also be demonstrated, for the vowel "e" (teem) and to a lesser degree 
even for the vowels which intervene between these two in the tables 
and chart of the vowel sounds. 



